A Parallel Corpus Labeled Using Open and Restricted Domain Ontologies

نویسندگان

  • Ester Boldrini
  • Sergio Ferrández
  • Rubén Izquierdo
  • David Tomás
  • José Luis Vicedo González
چکیده

The analysis and creation of annotated corpus is fundamental for implementing natural language processing solutions based on machine learning. In this paper we present a parallel corpus of 4500 questions in Spanish and English on the touristic domain, obtained from real users. With the aim of training a question answering system, the questions were labeled with the expected answer type, according to two different ontologies. The first one is an open domain ontology based on Sekine’s Extended Named Entity Hierarchy, while the second one is a restricted domain ontology, specific for the touristic field. Due to the use of two ontologies with different characteristics, we had to solve many problematic cases and adjusted our annotation thinking on the characteristics of each one. We present the analysis of the domain coverage of these ontologies and the results of the inter-annotator agreement. Finally we use a question classification system to evaluate the labeling of

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Ontological Relations Using Formal Concept Analysis

In this paper we present an approach for the automatic identification of relations in ontologies of restricted domain. We use the evidence found in a corpus associated to the same domain of the ontology for determining the validity of the ontological relations. Our approach employs formal concept analysis, a method used for the analysis of data, but in this case used for relations discovery in ...

متن کامل

Building frame-based corpus on the basis of ontological domain knowledge

Semantic Role Labeling (SRL) plays a key role in many NLP applications. The development of SRL systems for the biomedical domain is frustrated by the lack of large domainspecific corpora that are labeled with semantic roles. Corpus development has been very expensive and time-consuming. In this paper we propose a method for building frame-based corpus on the basis of domain knowledge provided b...

متن کامل

بررسی هستان شناسی های توسعه یافته مبتنی بر اصول هستان شناسی های منبع باز زیست پزشکی

Background and Aim: Ontologies facilitate data integration, exchange, searching and querying. Open Biomedical Ontologies (OBO) Foundry is a solution for creating reference ontologies. In this foundry, the design of ontologies is based on established principles which allow for their interactions as a single system. The purpose of this study is to determine the main features of ontologies develop...

متن کامل

Deploying Semantic Resources for Open Domain Question Answering

This thesis investigates how semantic resources can be deployed to improve the accuracy of an open domain question answering (QA) system. In particular, two types of semantic resources have been utilized to answer factoid questions: (1) Semantic parsing techniques are applied to analyze questions for semantic structures and to find phrases in the knowledge source that match these structures. (2...

متن کامل

Domain-Specific Ontology Mapping by Corpus-Based Semantic Similarity

Mapping heterogeneous ontologies is usually performed manually by domain experts, or accomplished by computer programs via comparing the structures of the ontologies and the linguistic semantics of their concepts. In this work, we take a different approach to compare and map the concepts of heterogeneous domain-specific ontologies by using a document corpus in a domain similar to the domain of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009